Modeling Others using Oneself in Multi-Agent Reinforcement Learning

نویسندگان

  • Roberta Raileanu
  • Emily L. Denton
  • Arthur Szlam
  • Rob Fergus
چکیده

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players’ hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent’s actions and update its belief of their hidden state in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players’ hidden states, in both cooperative and adversarial settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems

This paper describes how multi-agent system technology can be used as the underpinning platform for voltage control in power systems. In this study, some FACTS (flexible AC transmission systems) devices are properly designed to coordinate their decisions and actions in order to provide a coordinated secondary voltage control mechanism based on multi-agent theory. Each device here is modeled as ...

متن کامل

Simulation of Self-Control through Precommitment Behaviour in an Evolutionary System

The purpose of this thesis is to determine how evolution has resulted in selfcontrol through precommitment behaviour. Empirical data in psychology suggest that we recognize we have self-control problems and attempt to overcome them by exercising precommmitment, which bias our future choices to a larger, later reward. The behavioral model of self-control as an internal process is taken from psyc...

متن کامل

Applications of Game theory in multi-agent reinforcement learning

Multi-agent systems are a fast growing paradigm for problem solving and its applications are growing every day. Adaptivity is one of the key features of a Multi-agent system, which involves learning. Unfortunately due to extreme complexity of the environment in which the agents interact and the effect of each ones actions on the others, multi-agent learning is still an open problem. In this pap...

متن کامل

Multi-Agent Evolutionary Game Dynamics and Reinforcement Learning Applied to Online Optimization of Traffic Policy

This chapter demonstrates an application of agent-based selection dynamics to the traffic assignment problem. We introduce an evolutionary dynamic approach that acquires payoff data from multi-agent reinforcement learning to enable a adaptive optimization of traffic assignment, provided that classical theories of traffic user equilibrium pose the problem as one of global optimization. We then s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.09640  شماره 

صفحات  -

تاریخ انتشار 2018